117 research outputs found

    Analyzing the Amazon Mechanical Turk Marketplace

    Get PDF
    Since the concept of crowdsourcing is relatively new, many potential participants have questions about the AMT marketplace. For example, a common set of questions that pop up in an 'introduction to crowdsourcing and AMT' session are the following: What type of tasks can be completed in the marketplace? How much does it cost? How fast can I get results back? How big is the AMT marketplace? The answers for these questions remain largely anecdotal and based on personal observations and experiences. To understand better what types of tasks are being completed today using crowdsourcing techniques, we started collecting data about the AMT marketplace. We present a preliminary analysis of the dataset and provide directions for interesting future research

    Modeling Dependency in Prediction Markets

    Get PDF
    In the last decade, prediction markets became popular forecasting tools in areas ranging from election results to movie revenues and Oscar nominations. One of the features that make prediction markets particularly attractive for decision support applications is that they can be used to answer what-if questions and estimate probabilities of complex events. Traditional approach to answering such questions involves running a combinatorial prediction market, what is not always possible. In this paper, we present an alternative, statistical approach to pricing complex claims, which is based on analyzing co-movements of prediction market prices for basis events. Experimental evaluation of our technique on a collection of 51 InTrade contracts representing the Democratic Party Nominee winning Electoral College Votes of a particular state shows that the approach outperforms traditional forecasting methods such as price and return regressions and can be used to extract meaningful business intelligence from raw price data

    Modeling Volatility in Prediction Markets

    Get PDF
    Nowadays, there is a significant experimental evidence of excellent ex-post predictive accuracy in certain types of prediction markets, such as markets for elections. This evidence shows that prediction markets are efficient mechanisms for aggregating information and are more accurate in forecasting events than traditional forecasting methods, such as polls. Interpretation of prediction market prices as probabilities has been extensively studied in the literature, however little attention so far has been given to understanding volatility of prediction market prices. In this paper, we present a model of a prediction market with a binary payoff on a competitive event involving two parties. In our model, each party has some underlying ``ability'' process that describes its ability to win and evolves as an Ito diffusion. We show that if the prediction market for this event is efficient and accurate, the price of the corresponding contract will also follow a diffusion and its instantaneous volatility is a particular function of the current claim price and its time to expiration. We generalize our results to competitive events involving more than two parties and show that volatilities of prediction market contracts for such events are again functions of the current claim prices and the time to expiration, as well as of several additional parameters (ternary correlations of the underlying Brownian motions). In the experimental section, we validate our model on a set of InTrade prediction markets and show that it is consistent with observed volatilities of contract returns and outperforms the well-known GARCH model in predicting future contract volatility from historical price data. To demonstrate the practical value of our model, we apply it to pricing options on prediction market contracts, such as those recently introduced by InTrade. Other potential applications of this model include detection of significant market moves and improving forecast standard errors

    Estimating the Socio-Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

    Get PDF
    With the rapid growth of the Internet, the ability of users to create and publish content has created active electronic communities that provide a wealth of product information. However, the high volume of reviews that are typically published for a single product makes harder for individuals as well as manufacturers to locate the best reviews and understand the true underlying quality of a product. In this paper, we re-examine the impact of reviews on economic outcomes like product sales and see how different factors affect social outcomes like the extent of their perceived usefulness. Our approach explores multiple aspects of review text, such as lexical, grammatical, semantic, and stylistic levels to identify important text-based features. In addition, we also examine multiple reviewer-level features such as average usefulness of past reviews and the self-disclosed identity measures of reviewers that are displayed next to a review. Our econometric analysis reveals that the extent of subjectivity, informativeness, readability, and linguistic correctness in reviews matters in influencing sales and perceived usefulness. Reviews that have a mixture of objective, and highly subjective sentences have a negative effect on product sales, compared to reviews that tend to include only subjective or only objective information. However, such reviews are considered more informative (or helpful) by the users. By using Random Forest based classifiers, we show that we can accurately predict the impact of reviews on sales and their perceived usefulness. Reviews for products that have received widely fluctuating reviews, also have reviews of widely fluctuating helpfulness. In particular, we find that highly detailed and readable reviews can have low helpfulness votes in cases when users tend to vote negatively not because they disapprove of the review quality but rather to convey their disapproval of the review polarity. We examine the relative importance of the three broad feature categories: `reviewer-related' features, `review subjectivity' features, and `review readability' features, and find that using any of the three feature sets results in a statistically equivalent performance as in the case of using all available features. This paper is the first study that integrates econometric, text mining, and predictive modeling techniques toward a more complete analysis of the information captured by user-generated online reviews in order to estimate their socio-economic impact. Our results can have implications for judicious design of opinion forums

    Demographics of Mechanical Turk

    Get PDF
    We present the results of a survey that collected information about the demographics of participants on Amazon Mechanical Turk, together with information about their level of activity and motivation for working on Amazon Mechanical Turk. We find that approximately 50% of the workers come from the United States and 40% come from India. Country of origin tends to change the motivating reasons for workers to participate in the marketplace. Significantly more workers from India participate on Mechanical Turk because the online marketplace is a primary source of income, while in the US most workers consider Mechanical Turk a secondary source of income. While money is a primary motivating reason for workers to participate in the marketplace, workers also cite a variety of other motivating reasons, including entertainment and education

    Analyzing the Amazon Mechanical Turk Marketplace

    Get PDF
    Since the concept of crowdsourcing is relatively new, many potential participants have questions about the AMT marketplace. For example, a common set of questions that pop up in an 'introduction to crowdsourcing and AMT' session are the following: What type of tasks can be completed in the marketplace? How much does it cost? How fast can I get results back? How big is the AMT marketplace? The answers for these questions remain largely anecdotal and based on personal observations and experiences. To understand better what types of tasks are being completed today using crowdsourcing techniques, we started collecting data about the AMT marketplace. We present a preliminary analysis of the dataset and provide directions for interesting future research

    The Dimensions of Reputation in Electronic Markets

    Get PDF
    We present a framework for identifying the different dimensions of online reputation and characterizing their influence on the pricing power of sellers. Our theory predicts that sellers with better recorded online reputation can successfully charge higher prices than competing sellers of identical products, and that their pricing power increases with their recorded level of experience. We develop and implement a new text mining technique that identities and quantitatively assesses dimensions of importance in reputation profiles, and use this technique to create a new data set containing detailed reputation profiles and prices for sellers in over 9,500 transactions for consumer software on Amazon.com's online secondary marketplace. The estimation of a set of econometric models on this data set validates the predictions of our theory, and further, ranks these dimensions of reputation based on their effect on measured seller value, identifying those that have the most significant impact on reputation. This paper is the first study that integrates econometric and text mining techniques toward a more complete analysis of the information captured by reputation systems, and it presents new evidence of the importance of their effective and judicious design.Information Systems Working Papers Serie

    Relevance-based Retrieval on Hidden-Web Text Databases without Ranking Support

    Get PDF
    Many online or local data sources provide powerful querying mechanisms but limited ranking capabilities. For instance, PubMed allows users to submit highly expressive Boolean keyword queries, but ranks the query results by date only. However, a user would typically prefer a ranking by relevance, measured by an Information Retrieval (IR) ranking function. The naive approach would be to submit a disjunctive query with all query keywords, retrieve the returned documents, and then re-rank them. Unfortunately, such an operation would be very expensive due to the large number of results returned by disjunctive queries. In this paper we present algorithms that return the top results for a query, ranked according to an IR-style ranking function, while operating on top of a source with a Boolean query interface with no ranking capabilities (or a ranking capability of no interest to the end user). The algorithms generate a series of conjunctive queries that return only documents that are candidates for being highly ranked according to a relevance metric. Our approach can also be applied to other settings where the ranking is monotonic on a set of factors (query keywords in IR) and the source query interface is a Boolean expression of these factors. Our comprehensive experimental evaluation on the PubMed database and a TREC dataset show that we achieve order of magnitude improvement compared to the current baseline approaches.Vagelis Hristidis was partly supported by NSF grant IIS-0811922 and DHS grant 2009-ST-062-000016. Panagiotis G.\ Ipeirotis was supported by the National Science Foundation under Grant No. IIS-0643846
    • …
    corecore